Assignment Instructions

Complete all questions below. After completing the assignment, knit your document, and download both your .Rmd and knitted output. Upload your files for peer review.

For each response, include comments detailing your response and what each line does.


library(nycflights13)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Question 1.

Using the nycflights13 dataset, find all flights that departed in July, August, or September using the helper function between().

flights_jul_aug_sep <- flights %>%
  filter(between(month, 7, 9))

flights_jul_aug_sep

Question 2.

Using the nycflights13 dataset sort flights to find the 10 flights that flew the furthest. Put them in order of fastest to slowest.

longest_flights <- flights %>%
  arrange(desc(distance)) %>%
  head(10) %>%
  mutate(speed = distance / (air_time / 60)) %>%  # calculate speed in miles per hour
  arrange(desc(speed))

longest_flights

Question 3.

Using the nycflights13 dataset, calculate a new variable called “hr_delay” and arrange the flights dataset in order of the arrival delays in hours (longest delays at the top). Put the new variable you created just before the departure time.Hint: use the experimental argument .before.

flights_with_hr_delay <- flights %>%
  mutate(hr_delay = arr_delay / 60) %>%
  arrange(desc(hr_delay)) %>%
  select(year:day, hr_delay, dep_time, everything())

flights_with_hr_delay
Question 4.

Using the nycflights13 dataset, find the most popular destinations (those with more than 2000 flights) and show the destination, the date info, the carrier. Then show just the number of flights for each popular destination.

popular_destinations <- flights %>%
  group_by(dest) %>%
  filter(n() > 2000) %>%
  select(dest, year, month, day, carrier)

number_of_flights <- popular_destinations %>%
  group_by(dest) %>%
  summarise(num_flights = n())

list(popular_destinations, number_of_flights)
## [[1]]
## # A tibble: 302,969 × 5
## # Groups:   dest [46]
##    dest   year month   day carrier
##    <chr> <int> <int> <int> <chr>  
##  1 IAH    2013     1     1 UA     
##  2 IAH    2013     1     1 UA     
##  3 MIA    2013     1     1 AA     
##  4 ATL    2013     1     1 DL     
##  5 ORD    2013     1     1 UA     
##  6 FLL    2013     1     1 B6     
##  7 IAD    2013     1     1 EV     
##  8 MCO    2013     1     1 B6     
##  9 ORD    2013     1     1 AA     
## 10 PBI    2013     1     1 B6     
## # ℹ 302,959 more rows
## 
## [[2]]
## # A tibble: 46 × 2
##    dest  num_flights
##    <chr>       <int>
##  1 ATL         17215
##  2 AUS          2439
##  3 BNA          6333
##  4 BOS         15508
##  5 BTV          2589
##  6 BUF          4681
##  7 CHS          2884
##  8 CLE          4573
##  9 CLT         14064
## 10 CMH          3524
## # ℹ 36 more rows

Question 5.

Using the nycflights13 dataset, find the flight information (flight number, origin, destination, carrier, number of flights in the year, and percent late) for the flight numbers with the highest percentage of arrival delays. Only include the flight numbers that have over 100 flights in the year.

flight_info <- flights %>%
  group_by(flight) %>%
  filter(n() > 100) %>%
  summarise(
    origin = first(origin),
    dest = first(dest),
    carrier = first(carrier),
    num_flights = n(),
    percent_late = mean(arr_delay > 0) * 100
  ) %>%
  arrange(desc(percent_late))

flight_info